Date: Tue, 17 Jun 2014 14:04:52 +0400 From: Dmitry Sivachenko <trtrmitya@gmail.com> To: Ronald Klop <ronald-lists@klop.ws> Cc: freebsd-java@freebsd.org Subject: Re: JVM BUG(s) - Hadoop's threads hanging Message-ID: <1B53E600-B745-459E-98F8-7CEF9FDE77CC@gmail.com> In-Reply-To: <op.xhjxxemtkndu52@ronaldradial.radialsg.local> References: <E14F86A5-C7FE-49B5-8A11-F5237C557AE2@gmail.com> <op.xhjxxemtkndu52@ronaldradial.radialsg.local>
next in thread | previous in thread | raw e-mail | index | archive | help
On 16 =D0=B8=D1=8E=D0=BD=D1=8F 2014 =D0=B3., at 18:45, Ronald Klop = <ronald-lists@klop.ws> wrote: >=20 > Hi, >=20 > =46rom your information it is hard to say something about it. The bug = can be in FreeBSD, OpenJDK (the Oracle part or in the BSD port part), in = Hadoop or in your own code running on top of Hadoop. >=20 > My first idea would be to eliminate some of the possibilities. > - Run a Linux machine with the same versions of the software. > - Try FreeBSD 9-stable. I will try at least FreeBSD-9 soon (never used Linux so it will take = more time and not so relevant because I want to continue to use FreeBSD, = not just move to Linux) > - Try an older version of OpenJDK on FreeBSD. I already tried latest versions of openjdk-6/7/8 from ports. 7 and 8 behaves the same way (as I described in my original e-mail). = Below is the output of jstack for openjdk7 (java process running = taskttacker): 46897 hadoop 147 21 0 1927M 625M uwait 22 14:31 7.86% = java /tmp# jstack -l 46897 46897: Unable to open socket file: target process not responding or = HotSpot VM not loaded The -F option can be used when the target process is not responding /tmp# jstack -F -l 46897>/tmp/jstack.out Attaching to process ID 46897, please wait... Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 57) at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.tools.jstack.JStack.runJStackTool(JStack.java:136) at sun.tools.jstack.JStack.main(JStack.java:102) Caused by: sun.jvm.hotspot.debugger.UnalignedAddressException: = 746f705b762f4867 at = sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal$1.checkAlignment(BsdDebugger= Local.java:183) at = sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readCInteger(BsdDebuggerLoca= l.java:485) at = sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:4= 54) at = sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readAddress(BsdDebuggerLocal= .java:430) at = sun.jvm.hotspot.debugger.bsd.BsdAddress.getAddressAt(BsdAddress.java:74) at = sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java:1= 54) at = sun.jvm.hotspot.HotSpotTypeDataBase.<init>(HotSpotTypeDataBase.java:85) at = sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:573) at = sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:494) at = sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:332) at sun.jvm.hotspot.tools.Tool.start(Tool.java:163) at sun.jvm.hotspot.tools.JStack.main(JStack.java:86) ... 6 more /tmp# (jstack.out file is empty) openjdk-6 is different: during shuffle phase (when portions of = intermediate data are copied between data nodes), java process running = tasktracker consumes a lot of CPU (300-400%), and it is often in "vm = map" state. Data transfer is very-very slow (1MB/sec and less on 1GB = network). With openjdk7/8 network is utilized for about 40% (~40MB/sec), = it is acceptable though the question why isn't it 100MB/sec still = stands. So shuffle phase is almost stuck with openjdk6. But if you = wait long enough to finish this, tasktrackers in idle state behave as = expected (do not consume CPU). Below is the output of top(1) and = jstack: 35291 hadoop 209 22 0 1922M 461M vm map 17 46.5H 336.08% = java /tmp# jstack -l 35291 35291: Unable to open socket file: target process not responding or = HotSpot VM not loaded The -F option can be used when the target process is not responding /tmp# jstack -F -l 35291>/tmp/jstack.out Attaching to process ID 35291, please wait... Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 57) at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces = at java.lang.reflect.Method.invoke(Method.java:622) at sun.tools.jstack.JStack.runJStackTool(JStack.java:136) at sun.tools.jstack.JStack.main(JStack.java:102) Caused by: sun.jvm.hotspot.debugger.UnalignedAddressException: = 746f705b762f4867 at = sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal$1.checkAlignment(BsdDebugger= Local.java:183) at = sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readCInteger(BsdDebuggerLoca= l.java:480) at = sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:4= 54) at = sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readAddress(BsdDebuggerLocal= .java:425) at = sun.jvm.hotspot.debugger.bsd.BsdAddress.getAddressAt(BsdAddress.java:74) at = sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java:1= 54) at = sun.jvm.hotspot.HotSpotTypeDataBase.<init>(HotSpotTypeDataBase.java:85) at = sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:572) at = sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:493) at = sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:331) at sun.jvm.hotspot.tools.Tool.start(Tool.java:163) at sun.jvm.hotspot.tools.JStack.main(JStack.java:86) ... 6 more /tmp#=20 (/tmp/jstack.out file is empty) > - Try a very simple 'Hello world' style application on Hadoop which = mimics the thread usage. >=20 > Did you ever run your Hadoop application on FreeBSD before without = this symptom? If so, what are the differences between then and now? No, it is just my first install of hadoop and I use bundled terasort = test suite (hadoop jar = /usr/local/share/examples/hadoop/hadoop-examples-1.2.1.jar terasort = <...>) Since it is the problem with tasktracker (it does not run user-supplied = code, it just schedules tasks and performs cleanups), so it is hardly = relevant which particular task I execute.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1B53E600-B745-459E-98F8-7CEF9FDE77CC>