Welcome!

IBM Cloud Authors: XebiaLabs Blog, Liz McMillan, Mehdi Daoudi, Pat Romanski, Yeshim Deniz

Related Topics: Recurring Revenue, Java IoT, Microservices Expo, IBM Cloud

Recurring Revenue: Article

How to Diagnose Java Resource Starvation

Using the IBM Thread & Monitor Dump Analyzer for Java

Where do thread dumps go? The IBM JVM checks each of the following locations for existence and write-permission then stores the thread dump in the first one that's available:

  1. The location specified by the IBM_JAVACOREDIR environment variable if set to (_CEE_DMPTARG on z/OS).
  2. The current working directory of the JVM processes.
  3. The location specified by the TMPDIR environment variable, if set.
  4. The /tmp directory (C:\Temp on Windows).
  5. If the Javadump can't be stored in any of the above, it is put in STDERR.

Note that enough free disk space (possibly up to 2.5MB) must be available for the thread dump file to be written correctly.

Now let's analyze thread dumps with IBM Thread and Monitor Dump Analyzer for Java version. You can download a copy of latest IBM Thread and Monitor Dump Analyzer for Java from http://www.alphaworks.ibm.com/tech/jca. Uncompress the package to a directory then you can find a jar file. Here we have jca31.jar that we'll run with a Java virtual machine.

Usage

<Java runtime environment binary path>java -Xmx[heap size] -jar jca<IBM Thread and Monitor Dump Analyzer version>.jar_

To run the latest version of IBM Thread and Monitor Dump Analyzer for Java we need Java runtime environment 5.0 or higher version. IBM Thread and Monitor Dump Analyzer for Java can analyze thread dumps taken from 1.3.1, 1.4.1, 1.4.2,5.0, or 6.0. If you see java.lang.OutOfMemoryError while you're processing thread dumps, try increasing the maximum thread size (-Xmx) value to give the JVM more memory. The maximum heap size shouldn't be larger than the size of the available physical memory size to void performance degradation.

Starting IBM® Thread and Monitor Dump Analyzer for Java version 3.1
java -Xmx500m -jar jca31.jar

Figure 2 shows the first screen from IBM Thread and Monitor Dump Analyzer for Java.

We can click on an open icon or File->Open to open the thread dumps. Select thread dumps and click on the Open button to analyze the thread dumps. You don't need to worry about which vendor's Java virtual machine to generate thethread dumps. IBM Thread and Monitor Dump Analyzer for Java automatically detects most Java thread dump formats, such as IBM Javacores, Solaris thread dumps, and HP-UX thread dumps (see Figure 3). Please don't look for Javacores on Solaris and HP-UX systems.

Overall analysis can be done by clicking on a row in the Thread Dump List. We can get basic information such as File name, Cause of thread dump, timestamp of thread dump, Process ID, Java version, Free Java heap size, Allocated Java heap size, Memory Segment Analysis, Number of loaded classes in Java heap and Number of classloaders in the Java heap (see Figure 4).

In Figure 5, we have the Command line, Thread Status Analysis, Thread Method Analysis, and Thread Aggregation Analysis.

We see two blocked threads (pink background color) in the Thread Status Analysis.

There are lots of states a thread can run into.

  • Runnable: the thread can run when given the chance.
  • Condition Wait: the thread is waiting. For example, because:
    - A sleep() call is made
    - The thread has been blocked for I/O
    - A wait() method is called to wait on a monitor being notified
    - The thread is synchronizing with another thread with a join() call
  • Monitor Wait: thread is waiting on a monitor lock that's no longer available on IBM JVM 5.0 or 6.0.
  • Suspended: the thread has been suspended by another thread.
  • Zombie: the thread has been killed.
  • Blocked: the thread is waiting to obtain a lock that something else currently owns. This replaces the old Monitor Wait.
  • Parked: the thread has been parked by the new concurrency API (java.util.concurrent).

Let's select all thread dumps and run Compare Threads by clicking on the right mouse button, icon, or Analysis->Compare Threads menu (see Figure 6).

We have four hang suspects. We can disregard the Signal Dispatcher thread since we got the thread dumps with signal 3 (see Figure 7).

Be aware that I use the word "suspect" instead of "perpetrator" because hang suspects are threads that have the same stack traces in multiple thread dumps. It's not unusual to see Signal Dispatcher threads have same stack trace in multiple thread dumps.

Let's click on one of Plato's cells that has method name. In the middle pane, we can see there are two threads waiting. We can click on each of them to see detailed information. In the right pane, we can see detailed information about this Plato thread like thread name, state, monitor, Java stack trace, and native stack trace.

In the left pane, we have a table and cells that include method name, thread state, and hang indicator. The red background color indicates that the thread is a hang suspect.

In this analysis screen, it shows Plato, Socrates, and Aristotle are hang suspects since they have red background colors. Aristotle and Socrates are blocked by Plato because we see Aristotle and Socrates in Plato'sWaiting Thread List.

Do you remember the method, run() in the class, Philosopher that calls the sleep method for two seconds when two chopsticks are available? So Plato is eating his dinner now. He's not sleeping (see Figure 8).

If we click on Socrates which is one of blocked threads, we can see who is blocking Socrates (see Figure 9). We already know that it's Plato. Let's click on Socrates' method cell. In the bottom of the middle pane, it shows that Plato is blocking this thread.

We can take a look at the problem from another perspective: monitor comparison analysis (see Figure 10). Let's start monitor comparison analysis by selecting the monitor comparison analysis icon or Analysis->Compare Monitors menu. It shows that Plato has a thread that owns monitors because there's a row for the Plato thread. Each cell has a red background color. So it's a hang suspect. This could be a problem. A thread is a hang suspect and owns monitors. If there are any threads waiting for the monitors, it's a problem. If we click on any of the red cells, we can see there are two waiting threads in the middle pane. Of course you can click on each of waiting threads to check for detailed information about each waiting thread.

Let's close the Compare Monitors window and Compare Threads windows, select a thread dump and select the Analysis->Thread Detail menu to take a look at more details about the thread dump (see Figure 11).

This is the single thread dump analysis (see Figure 12). On the right pane, we can see the overall analysis of this thread dump. Aristotle and Socrates are blocked. Plato has an icon that indicates it owns the monitors .

Let's click on Plato's state cell to display the waiting threads (see Figure 13). We can also confirm that Plato blocks the Aristotle and Socrates threads.

There's one last window I'd like to share with readers. We can close the Thread Detail window, select a thread dump from Thread Dump List, and select Analysis->Monitor Detail menu to visualize monitors and threads in the thread dump. In the Monitor detail view of the thread dump, we can see the picture more clearly if it's not clear enough from the previous windows.

We have a Plato in the tree view and it has two children, Socrates and Aristotle, which means Plato owns monitors that Socrates and Aristotle are trying to acquire. The numbers [2/2] in front of Plato mean there are 2(first/TotalSize) waiting threads directly/indirectly and 2(second/Size) waiting threads directly. If Socrates owned a monitor and a thread waiting to own the monitor, that monitor is indirectly owned by Plato's thread. If that's the case we would see [3/2] in front of Plato.

TotalSize is just like totalsize in the HeapAnalyzer (see Figure 14). It's the total number of threads blocked directly or indirectly by a thread. Size is just like size in the HeapAnalyzer. That's the number of threads blocked directly by a thread. We don't have nested monitor ownership in this thread dump. So total size is the same as size in this test case. So far we've analyzed thread dumps generated by an IBM Java virtual machine to investigate the resource starvation problem. Next we'll analyze a thread dump from a different vendor. As shown in Figure 15, let's open a sun.log file from Listing 8 using the same method as in Figure 3.

You may have noticed that there are three entries with suffixes "_1,"_2," and "_3" in the Thread Dump List even though we opened only one file - sun.log. That indicates that there are three thread dumps in sun.log. This vendor's thread dump doesn't have as much information as the IBM thread dumps provided in Figures 4 and 5. This vendor's JVM generates thread dumps to a standard out whereas IBM's Java virtual machine created separate files or Javacores. Thread dumps' location can be quite different from one vendor's JVM to another. You might want to check the documentation of your Java virtual machine to find out where the thread dumps are and how to generate them.

Let's select three entries and invoke the thread comparison analysis as we did in Figure 6.

We can see Aristotle and Socrates are waiting on the monitor whereas they're blocked in Figure 7. Figure 16 shows a thread comparison analysis from IBM Java thread dumps/javacores. I personally prefer to see "waiting on monitor" rather than "blocked" in resource starvation problems caused by monitors because it seems more explicit than just "blocked." This is another reminder that each vendor has its own format or definition in its thread dumps.

Let's click on Plato's first method cell to see what else we can find out. From this vendor's thread dump, we can see where monitor locks are (see Figure 17).

We can see this thread locked two different monitors, [0x244dba80] (a com.ibm.jinwoo.starvation.Chopstick) and [0x244dba60] (a com.ibm.jinwoo.starvation.Chopstick) before executing line 62 of Philosopher.java. This is very useful information considering we didn't run a debugger. What we did just sent a signal to this vendor's JVM process.

Java stack trace from another vendor's Java virtual machine
at java.lang.Thread.sleep(Native Method)
at com.ibm.jinwoo.starvation.Philosopher.eat(Philosopher.java:72)
at com.ibm.jinwoo.starvation.Philosopher.run(Philosopher.java:62)
- locked [0x244dba60] (a com.ibm.jinwoo.starvation.Chopstick)
- locked [0x244dba80] (a com.ibm.jinwoo.starvation.Chopstick)

We didn't see this kind of information from IBM's thread dump in Figure 13 where we also didn't see where the monitor locks are happening. But we can see one more stack, java/lang/Thread.sleep(Thread.java:850) from IBM's thread dump, where another vendor's thread dump has only one stack for the java/long/Thread.sleep method.

Java stack trace from a IBM Java virtual machine
at java/lang/Thread.sleep(Native Method)
at java/lang/Thread.sleep(Thread.java:850)
at com/ibm/jinwoo/starvation/Philosopher.eat(Philosopher.java:72)
at com/ibm/jinwoo/starvation/Philosopher.run(Philosopher.java:62)

Let's click on a method cell of Socrates (see Figure 18). We can see where the thread is waiting to lock the monitor. We don't see this information in IBM's thread dump in Figure 9.

Let's invoke the thread detail analysis of a thread as we did in Figure 11. The analysis shown in Figure 19 is a little different than in Figure 12, the thread detail analysis on IBM's thread dump. We can see two threads are waiting for monitors. In Figure 12, we see two blocked threads.

You can check out the monitor detail analysis and other analysis windows to see any differences between two different JVM providers. Overall there's enough information in both types of thread dumps to diagnose resource starvation problems especially when it's due to monitor contention.

What if you don't use Java's built-in monitor and implement your own monitor? Let's experiment with this scenario. Don't worry. I'll write test classes for you. Luckily we can reuse DiningPhilosopher.java.

In this revised version of the Chopstick class, we need to add an instance variable and two instance methods to implement our own monitor. An instance of Philosopher, owner is added to indicate ownership of a chopstick. The methods, pickUp() and putDown(), are added for exclusive access to a chopstick.

If someone got a chopstick, a philosopher will wait for one second. If nobody picked up the chopstick, he will pick it up. If a philosopher owns a chopstick, he'll make the owner of the chopstick null and notify. See Listing 5.

We need to change the Philosopher class a little bit to invoke methods, pickUp() and putDown(), instead of using ‘synchronized ().' You might have noticed that it's much easier to use Java's built-in monitor than to use our own monitor but there are times when you really want to implement your own monitor for granular control.

A philosopher will pick up the left chopstick first then the right chopstick. If both chopsticks are available, he'll eat for two seconds and put them down for the other philosophers. (Not really appetizing, is it.) See Listing 6.

We can follow the same procedures we used above to compile, run, and generate thread dumps during resource starvation.

Now it's time for the diagnosis. Let's open another vendor's thread dumps first to be fair.

Let's click on one of the thread dumps (see Figure 20). This time I redirected standard out to a file, sun.log. Surprise! We don't see any threads waiting on the monitor!! You can compare this with Figure 15 where we see two threads waiting on monitors. There's no monitor information whereas we see information on two monitors in Figure 15.

Let's select three thread dumps and invoke a thread comparison analysis as we did in Figure 6. We can see Plato, Aristotle, and Socrates are just in Object.wait() (see Figure 21). We don't see any ‘waiting on monitor.'

We can click on one of Plato's method cells to find more details (see Figure 22). We can compare this with Figure 17. There's no information about which threads are waiting.

Let's close compare threads window, select a thread dump from Thread Dump List, and select Analysis->Monitor Detail menu to visualize the monitors and threads in a thread dump. Another surprise! You don't see anything like Figure 14. There is no monitor information at all (see Figure 23).

Can IBM's thread dumps do any better? Let's open them up. In Figure 24 we don't see any blocked threads as in Figure 5.

Let's select all the thread dumps and run a Compare Threads by clicking on the right mouse button, icon, or Analysis->Compare Threads menu (see Figure 25). In Figure 6, we saw blocked threads but we don't see them anymore.

Plato's detailed view doesn't show us any clue (see Figure 26). There are no waiting threads. There's no monitor ownership.

If you implement your own monitor and you encounter resource starvation caused by the monitor, you are on your own! None of the Java virtual machines actually analyzed application flow and provided us with any useful information to diagnose this problem. If possible, we need to use Java's built-in monitor to take advantage of information available in the thread dumps. Otherwise, you need to run a debugger. You might know that it's very challenging to run a debugger on production systems. In most cases, you're not allowed to run a debugger on production systems without any management approval.

Summary
This article illustrated resource starvation caused by monitor contention, implemented two test cases to simulate resource starvation using a revised version of the Dining Philosophers Problem and demonstrated how easily IBM Thread and Monitor Dump Analyzer for Java can help diagnose s resource starvation problem with step-by-step instructions.

You can diagnose a resource starvation problem with a debugger. What if you can't run a debugger in your environment? What if you can't recreate the problem with your debugger that your client reported? Thread dumps are very useful when a debugger isn't the best choice. What's even better is that we have IBM Thread and Monitor Dump Analyzer for Java to analyze thread dumps if you don't want to read through hundreds or thousands of thread stacks in raw thread dumps.

There are more features in IBM Thread and Monitor Dump Analyzer for Java that I haven't talked about. I hope I can introduce you to them in future articles.

References

More Stories By Jinwoo Hwang

Jinwoo Hwang is a software engineer, inventor, author, and technical leader at IBM WebSphere Application Server Technical Support in Research Triangle Park, North Carolina. He joined IBM in 1995 and worked with IBM Global Learning Services, IBM Consulting Services, and software development teams prior to his current position at IBM. He is an IBM Certified Solution Developer and IBM Certified WebSphere Application Server System Administrator as well as a SUN Certified Programmer for the Java platform. He is the architect and creator of the following technologies:

Mr. Hwang is the author of the book C Programming for Novices (ISBN:9788985553643, Yonam Press, 1995) as well as the following webcasts and articles:

Mr. Hwang is the author of the following IBM technical articles:

  • VisualAge Performance Guide,1999
  • CORBA distributed object applet/servlet programming for IBM WebSphere Application Server and VisualAge for Java v2.0E ,1999
  • Java CORBA programming for VisualAge for Java ,1998
  • MVS/CICS application programming for VisualAge Generator ,1998
  • Oracle Native/ODBC application programming for VisualAge Generator ,1998
  • MVS/CICS application Web connection programming for VisualAge Generator ,1998
  • Java applet programming for VisualAge WebRunner ,1998
  • VisualAge for Java/WebRunner Server Works Java Servlet Programming Guide ,1998
  • RMI Java Applet programming for VisualAge for Java ,1998
  • Multimedia Database Java Applet Programming Guide ,1997
  • CICS ECI Java Applet programming guide for VisualAge Generator 3.0 ,1997
  • CICS ECI DB2 Application programming guide for VigualGen, 1997
  • VisualGen CICS ECI programming guide, 1997
  • VisualGen CICS DPL programming guide, 1997

Mr. Hwang holds the following patents in the U.S. / other countries:


Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
Five years ago development was seen as a dead-end career, now it’s anything but – with an explosion in mobile and IoT initiatives increasing the demand for skilled engineers. But apart from having a ready supply of great coders, what constitutes true ‘DevOps Royalty’? It’ll be the ability to craft resilient architectures, supportability, security everywhere across the software lifecycle. In his keynote at @DevOpsSummit at 20th Cloud Expo, Jeffrey Scheaffer, GM and SVP, Continuous Delivery Busine...
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Val...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
SYS-CON Events announced today that Hitachi, the leading provider the Internet of Things and Digital Transformation, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Hitachi Data Systems, a wholly owned subsidiary of Hitachi, Ltd., offers an integrated portfolio of services and solutions that enable digital transformation through enhanced data management, governance, mobility and analytics. We help globa...
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in compute, storage and networking technologies, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/...
Amazon has gradually rolled out parts of its IoT offerings in the last year, but these are just the tip of the iceberg. In addition to optimizing their back-end AWS offerings, Amazon is laying the ground work to be a major force in IoT – especially in the connected home and office. Amazon is extending its reach by building on its dominant Cloud IoT platform, its Dash Button strategy, recently announced Replenishment Services, the Echo/Alexa voice recognition control platform, the 6-7 strategic...
@GonzalezCarmen has been ranked the Number One Influencer and @ThingsExpo has been named the Number One Brand in the “M2M 2016: Top 100 Influencers and Brands” by Analytic. Onalytica analyzed tweets over the last 6 months mentioning the keywords M2M OR “Machine to Machine.” They then identified the top 100 most influential brands and individuals leading the discussion on Twitter.
SYS-CON Events announced today that Juniper Networks (NYSE: JNPR), an industry leader in automated, scalable and secure networks, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Juniper Networks challenges the status quo with products, solutions and services that transform the economics of networking. The company co-innovates with customers and partners to deliver automated, scalable and secure network...
The age of Digital Disruption is evolving into the next era – Digital Cohesion, an age in which applications securely self-assemble and deliver predictive services that continuously adapt to user behavior. Information from devices, sensors and applications around us will drive services seamlessly across mobile and fixed devices/infrastructure. This evolution is happening now in software defined services and secure networking. Four key drivers – Performance, Economics, Interoperability and Trust ...
Data is an unusual currency; it is not restricted by the same transactional limitations as money or people. In fact, the more that you leverage your data across multiple business use cases, the more valuable it becomes to the organization. And the same can be said about the organization’s analytics. In his session at 19th Cloud Expo, Bill Schmarzo, CTO for the Big Data Practice at Dell EMC, introduced a methodology for capturing, enriching and sharing data (and analytics) across the organization...
SYS-CON Events announced today that T-Mobile will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on ...
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
Bert Loomis was a visionary. This general session will highlight how Bert Loomis and people like him inspire us to build great things with small inventions. In their general session at 19th Cloud Expo, Harold Hannon, Architect at IBM Bluemix, and Michael O'Neill, Strategic Business Development at Nvidia, discussed the accelerating pace of AI development and how IBM Cloud and NVIDIA are partnering to bring AI capabilities to "every day," on-demand. They also reviewed two "free infrastructure" pr...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists will examine how DevOps helps to meet th...
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the USA and Europe, we work with a variety of customers from emerging startups to Fortune 1000 companies.
Financial Technology has become a topic of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 20th Cloud Expo at the Javits Center in New York, June 6-8, 2017, will find fresh new content in a new track called FinTech.
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...