From owner-freebsd-java@FreeBSD.ORG Tue Jun 17 10:04:58 2014 Return-Path: Delivered-To: freebsd-java@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C8C88B3B for ; Tue, 17 Jun 2014 10:04:58 +0000 (UTC) Received: from mail-lb0-x22c.google.com (mail-lb0-x22c.google.com [IPv6:2a00:1450:4010:c04::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 408EF2C09 for ; Tue, 17 Jun 2014 10:04:58 +0000 (UTC) Received: by mail-lb0-f172.google.com with SMTP id c11so3887671lbj.3 for ; Tue, 17 Jun 2014 03:04:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=s6MnV6pJj2FJgnTRi/9t3v5WXVMY5xCuEZwzovBwUFE=; b=MtKTfxJ+VVBTZ5IlViYxkQ5Tp6lKkkD2e8aF8FQPdFkvAxLIidNJg4VHY/+bZ3W4eX 7zKcRCC0ytzQHPFB3G4N5cReOWTDCZCYC+uylee9XDEJEdHZk+abJR2OA1OpkyAmm0wR S6Mc5jfLEava5Ns3byXAUKKv1dJoHyQFXyBGVb6ardByNOJk6OfZ66uiWnu/xaDcFWGD po9jUkMJBQbx66WeJGm4hrRKW5C3Fu7ucgILJ2gJst/nluUqmQVdwvyKX/BdpO4ENqUN JpFePaTfwpY0s/uIiZXm8KpU+TmMlUS7uS61NsSdndJuzYClaV36kpOKR8VrIGAIY3mu WR6Q== X-Received: by 10.112.134.97 with SMTP id pj1mr17036215lbb.9.1402999496094; Tue, 17 Jun 2014 03:04:56 -0700 (PDT) Received: from ?IPv6:2a02:6b8::408:1c07:9287:c007:a6cb? ([2a02:6b8:0:408:1c07:9287:c007:a6cb]) by mx.google.com with ESMTPSA id n3sm5659503lan.3.2014.06.17.03.04.53 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 17 Jun 2014 03:04:54 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.2\)) Subject: Re: JVM BUG(s) - Hadoop's threads hanging From: Dmitry Sivachenko In-Reply-To: Date: Tue, 17 Jun 2014 14:04:52 +0400 Content-Transfer-Encoding: quoted-printable Message-Id: <1B53E600-B745-459E-98F8-7CEF9FDE77CC@gmail.com> References: To: Ronald Klop X-Mailer: Apple Mail (2.1878.2) Cc: freebsd-java@freebsd.org X-BeenThere: freebsd-java@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Porting Java to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Jun 2014 10:04:58 -0000 On 16 =D0=B8=D1=8E=D0=BD=D1=8F 2014 =D0=B3., at 18:45, Ronald Klop = wrote: >=20 > Hi, >=20 > =46rom your information it is hard to say something about it. The bug = can be in FreeBSD, OpenJDK (the Oracle part or in the BSD port part), in = Hadoop or in your own code running on top of Hadoop. >=20 > My first idea would be to eliminate some of the possibilities. > - Run a Linux machine with the same versions of the software. > - Try FreeBSD 9-stable. I will try at least FreeBSD-9 soon (never used Linux so it will take = more time and not so relevant because I want to continue to use FreeBSD, = not just move to Linux) > - Try an older version of OpenJDK on FreeBSD. I already tried latest versions of openjdk-6/7/8 from ports. 7 and 8 behaves the same way (as I described in my original e-mail). = Below is the output of jstack for openjdk7 (java process running = taskttacker): 46897 hadoop 147 21 0 1927M 625M uwait 22 14:31 7.86% = java /tmp# jstack -l 46897 46897: Unable to open socket file: target process not responding or = HotSpot VM not loaded The -F option can be used when the target process is not responding /tmp# jstack -F -l 46897>/tmp/jstack.out Attaching to process ID 46897, please wait... Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 57) at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.tools.jstack.JStack.runJStackTool(JStack.java:136) at sun.tools.jstack.JStack.main(JStack.java:102) Caused by: sun.jvm.hotspot.debugger.UnalignedAddressException: = 746f705b762f4867 at = sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal$1.checkAlignment(BsdDebugger= Local.java:183) at = sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readCInteger(BsdDebuggerLoca= l.java:485) at = sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:4= 54) at = sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readAddress(BsdDebuggerLocal= .java:430) at = sun.jvm.hotspot.debugger.bsd.BsdAddress.getAddressAt(BsdAddress.java:74) at = sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java:1= 54) at = sun.jvm.hotspot.HotSpotTypeDataBase.(HotSpotTypeDataBase.java:85) at = sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:573) at = sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:494) at = sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:332) at sun.jvm.hotspot.tools.Tool.start(Tool.java:163) at sun.jvm.hotspot.tools.JStack.main(JStack.java:86) ... 6 more /tmp# (jstack.out file is empty) openjdk-6 is different: during shuffle phase (when portions of = intermediate data are copied between data nodes), java process running = tasktracker consumes a lot of CPU (300-400%), and it is often in "vm = map" state. Data transfer is very-very slow (1MB/sec and less on 1GB = network). With openjdk7/8 network is utilized for about 40% (~40MB/sec), = it is acceptable though the question why isn't it 100MB/sec still = stands. So shuffle phase is almost stuck with openjdk6. But if you = wait long enough to finish this, tasktrackers in idle state behave as = expected (do not consume CPU). Below is the output of top(1) and = jstack: 35291 hadoop 209 22 0 1922M 461M vm map 17 46.5H 336.08% = java /tmp# jstack -l 35291 35291: Unable to open socket file: target process not responding or = HotSpot VM not loaded The -F option can be used when the target process is not responding /tmp# jstack -F -l 35291>/tmp/jstack.out Attaching to process ID 35291, please wait... Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 57) at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces = at java.lang.reflect.Method.invoke(Method.java:622) at sun.tools.jstack.JStack.runJStackTool(JStack.java:136) at sun.tools.jstack.JStack.main(JStack.java:102) Caused by: sun.jvm.hotspot.debugger.UnalignedAddressException: = 746f705b762f4867 at = sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal$1.checkAlignment(BsdDebugger= Local.java:183) at = sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readCInteger(BsdDebuggerLoca= l.java:480) at = sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:4= 54) at = sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readAddress(BsdDebuggerLocal= .java:425) at = sun.jvm.hotspot.debugger.bsd.BsdAddress.getAddressAt(BsdAddress.java:74) at = sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java:1= 54) at = sun.jvm.hotspot.HotSpotTypeDataBase.(HotSpotTypeDataBase.java:85) at = sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:572) at = sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:493) at = sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:331) at sun.jvm.hotspot.tools.Tool.start(Tool.java:163) at sun.jvm.hotspot.tools.JStack.main(JStack.java:86) ... 6 more /tmp#=20 (/tmp/jstack.out file is empty) > - Try a very simple 'Hello world' style application on Hadoop which = mimics the thread usage. >=20 > Did you ever run your Hadoop application on FreeBSD before without = this symptom? If so, what are the differences between then and now? No, it is just my first install of hadoop and I use bundled terasort = test suite (hadoop jar = /usr/local/share/examples/hadoop/hadoop-examples-1.2.1.jar terasort = <...>) Since it is the problem with tasktracker (it does not run user-supplied = code, it just schedules tasks and performs cleanups), so it is hardly = relevant which particular task I execute.