Date: Tue, 17 Jun 2014 14:04:52 +0400 From: Dmitry Sivachenko <trtrmitya@gmail.com> To: Ronald Klop <ronald-lists@klop.ws> Cc: freebsd-java@freebsd.org Subject: Re: JVM BUG(s) - Hadoop's threads hanging Message-ID: <1B53E600-B745-459E-98F8-7CEF9FDE77CC@gmail.com> In-Reply-To: <op.xhjxxemtkndu52@ronaldradial.radialsg.local> References: <E14F86A5-C7FE-49B5-8A11-F5237C557AE2@gmail.com> <op.xhjxxemtkndu52@ronaldradial.radialsg.local>
next in thread | previous in thread | raw e-mail | index | archive | help
On 16 =D0=B8=D1=8E=D0=BD=D1=8F 2014 =D0=B3., at 18:45, Ronald Klop =
<ronald-lists@klop.ws> wrote:
>=20
> Hi,
>=20
> =46rom your information it is hard to say something about it. The bug =
can be in FreeBSD, OpenJDK (the Oracle part or in the BSD port part), in =
Hadoop or in your own code running on top of Hadoop.
>=20
> My first idea would be to eliminate some of the possibilities.
> - Run a Linux machine with the same versions of the software.
> - Try FreeBSD 9-stable.
I will try at least FreeBSD-9 soon (never used Linux so it will take =
more time and not so relevant because I want to continue to use FreeBSD, =
not just move to Linux)
> - Try an older version of OpenJDK on FreeBSD.
I already tried latest versions of openjdk-6/7/8 from ports.
7 and 8 behaves the same way (as I described in my original e-mail). =
Below is the output of jstack for openjdk7 (java process running =
taskttacker):
46897 hadoop 147 21 0 1927M 625M uwait 22 14:31 7.86% =
java
/tmp# jstack -l 46897
46897: Unable to open socket file: target process not responding or =
HotSpot VM not loaded
The -F option can be used when the target process is not responding
/tmp# jstack -F -l 46897>/tmp/jstack.out
Attaching to process ID 46897, please wait...
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at =
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:=
57)
at =
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm=
pl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
at sun.tools.jstack.JStack.main(JStack.java:102)
Caused by: sun.jvm.hotspot.debugger.UnalignedAddressException: =
746f705b762f4867
at =
sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal$1.checkAlignment(BsdDebugger=
Local.java:183)
at =
sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readCInteger(BsdDebuggerLoca=
l.java:485)
at =
sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:4=
54)
at =
sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readAddress(BsdDebuggerLocal=
.java:430)
at =
sun.jvm.hotspot.debugger.bsd.BsdAddress.getAddressAt(BsdAddress.java:74)
at =
sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java:1=
54)
at =
sun.jvm.hotspot.HotSpotTypeDataBase.<init>(HotSpotTypeDataBase.java:85)
at =
sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:573)
at =
sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:494)
at =
sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:332)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:163)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
... 6 more
/tmp#
(jstack.out file is empty)
openjdk-6 is different: during shuffle phase (when portions of =
intermediate data are copied between data nodes), java process running =
tasktracker consumes a lot of CPU (300-400%), and it is often in "vm =
map" state. Data transfer is very-very slow (1MB/sec and less on 1GB =
network). With openjdk7/8 network is utilized for about 40% (~40MB/sec), =
it is acceptable though the question why isn't it 100MB/sec still =
stands. So shuffle phase is almost stuck with openjdk6. But if you =
wait long enough to finish this, tasktrackers in idle state behave as =
expected (do not consume CPU). Below is the output of top(1) and =
jstack:
35291 hadoop 209 22 0 1922M 461M vm map 17 46.5H 336.08% =
java
/tmp# jstack -l 35291
35291: Unable to open socket file: target process not responding or =
HotSpot VM not loaded
The -F option can be used when the target process is not responding
/tmp# jstack -F -l 35291>/tmp/jstack.out
Attaching to process ID 35291, please wait...
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at =
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:=
57)
at =
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces =
at java.lang.reflect.Method.invoke(Method.java:622)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
at sun.tools.jstack.JStack.main(JStack.java:102)
Caused by: sun.jvm.hotspot.debugger.UnalignedAddressException: =
746f705b762f4867
at =
sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal$1.checkAlignment(BsdDebugger=
Local.java:183)
at =
sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readCInteger(BsdDebuggerLoca=
l.java:480)
at =
sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:4=
54)
at =
sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readAddress(BsdDebuggerLocal=
.java:425)
at =
sun.jvm.hotspot.debugger.bsd.BsdAddress.getAddressAt(BsdAddress.java:74)
at =
sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java:1=
54)
at =
sun.jvm.hotspot.HotSpotTypeDataBase.<init>(HotSpotTypeDataBase.java:85)
at =
sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:572)
at =
sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:493)
at =
sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:331)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:163)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
... 6 more
/tmp#=20
(/tmp/jstack.out file is empty)
> - Try a very simple 'Hello world' style application on Hadoop which =
mimics the thread usage.
>=20
> Did you ever run your Hadoop application on FreeBSD before without =
this symptom? If so, what are the differences between then and now?
No, it is just my first install of hadoop and I use bundled terasort =
test suite (hadoop jar =
/usr/local/share/examples/hadoop/hadoop-examples-1.2.1.jar terasort =
<...>)
Since it is the problem with tasktracker (it does not run user-supplied =
code, it just schedules tasks and performs cleanups), so it is hardly =
relevant which particular task I execute.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1B53E600-B745-459E-98F8-7CEF9FDE77CC>
