A Deep Dive into the JVM Startup Procedure

The new translation from the Spring IO team will tell you what happens when you run the simplest Java application, what steps the JVM takes, how many classes it needs to load to simply write “Hello World!” and what it all looks like at the byte code level.

When you run a Java application, there may be a tendency to believe that the only code running at that moment is the Java byte code passed to the JVM, i.e. the files .classcompiled by javac. In reality, when an application starts, the JVM goes through a complex series of steps, creating a kind of little universe in which the application will run. In this article, we will look at all the steps the JVM goes through between $ javaand outputting the string Hello World. If you prefer video format, there is also a YouTube video on the Java channel that covers the same thing.

Introduction

To prevent this overview of the launch procedure from turning into an attempt to “boil the ocean,” we will add some limitations that I will use when describing the process:

I will describe the JVM startup procedure as it happens in JDK 23. The JVM specification for Java SE 23 can be found here .
I will use the HotSpot JVM implementation as an example. This is the most commonly used JVM implementation, many popular JDK distributions use the HotSpot JVM or its derivatives. Alternative JVM implementations may have slightly different internal behavior.
Finally, the main code example used to describe the JVM startup procedure will be HelloWorld, because this application, although the simplest in the world, still includes all the key parts of the JVM startup procedure.

Despite these limitations, after reading this article you will have a pretty good idea of the processes the JVM goes through when it starts up, and what they are for. This knowledge will help you debug your application if it has problems when it starts up, and in some special cases it will be useful for improving performance. Although we will talk about this closer to the end of the article.

Initializing the JVM

When the user enters the command java, the JVM startup procedure is started, and a JNI (Java Native Interface) function is called, JNI_CreateJavaVM()the code for which you can view here . This JNI function itself performs several important processes.

Validating user input

The first step in the JVM startup procedure is to validate the user input: JVM arguments, the artifact to execute, and the classpath. Below is a log fragment showing how the validation occurs:

[arguments] VM Arguments:
[arguments] jvm_args: -Xlog:all=trace 
[arguments] java_command: HelloWorld
[arguments] java_class_path (initial): .
[arguments] Launcher Type: SUN_STANDARD

💡 Note: You can see this log using -Xlog:all=traceJVMarg.

Discovering system resources

After validating user input, the next step is to discover the available system resources: CPUs, system memory, and system services that the JVM can use. The availability of system resources can influence the decisions the JVM makes, which are based on its internal heuristics. For example, the default garbage collector chosen by the JVM will depend on the available CPU and system memory, but in many cases the JVM’s internal heuristics can be overridden by explicitly specified arguments to the JVM.

[os       ] Initial active processor count set to 11
[gc,heap  ]   Maximum heap size 9663676416
[gc,heap  ]   Initial heap size 603979776
[gc,heap  ]   Minimum heap size 1363144
[metaspace]  - commit_granule_bytes: 65536.
[metaspace]  - commit_granule_words: 8192.
[metaspace]  - virtual_space_node_default_size: 8388608.
[metaspace]  - enlarge_chunks_in_place: 1.
[os       ] Use of CLOCK_MONOTONIC is supported
[os       ] Use of pthread_condattr_setclock is not supported

Preparing the environment

Once the JVM has figured out what system resources are available to it, it starts preparing the environment. At this point, the HotSpot implementation of the JVM generates hsprefdata(HotSpot performance data). This data is used by tools JConsoleand VisualVMfor inspecting and profiling the JVM. This data is usually stored in the system directory /tmp. Below is an example of how the JVM creates this profiling data, and this will continue for some time during the startup of an application, in parallel with other processes.

[perf,datacreation] name = sun.rt._sync_Inflations, dtype = 11, variability = 2, units = 4, dsize = 8, vlen = 0, pad_length = 4, size = 56, on_c_heap = FALSE, address = 0x0000000100c2c020, data address = 0x0000000100c2c050

An important step in the JVM startup procedure is choosing a garbage collector (GC). The choice of garbage collector can have a significant impact on the performance of an application. By default, the JVM will choose between two collectors, Serial GC and G1 GC, unless other collectors are explicitly specified.

Starting with JDK 23, the JVM selects G1 GC by default, unless the system is smaller than 1792 MB and/or has only one processor, in which case Serial GC will be selected. Of course, other garbage collectors may be available for selection, including Parallel GC, ZGC, and others, depending on the JDK version and distribution used. Each of these garbage collectors has its own performance characteristics and ideal workloads.

[gc           ] Using G1
[gc,heap,coops] Trying to allocate at address 0x00000005c0000000 heap of size 0x240000000
[os,map       ] Reserved [0x00000005c0000000 - 0x0000000800000000), (9663676416 bytes)
[gc,heap,coops] Heap address: 0x00000005c0000000, size: 9216 MB, Compressed Oops mode: Zero based, Oop shift amount: 3

CDS

At about this point, the JVM starts looking for a CDS archive. CDS stands for Cached Data Storage, formerly known as Class Data Storage. A CDS archive is an archive of class files that have been preprocessed to improve the speed of the JVM startup procedure. We’ll talk about how CDS improves JVM startup performance in the “Class Binding” section. However, don’t clutter your memory with the “CDS” acronym, it’s outdated, and we’ll talk about that when we talk about the future of the JVM startup procedure.

Comment from the editors of Spring AIO:

[cds] trying to map [Java home]/lib/server/classes.jsa
[cds] Opened archive [Java home]/lib/server/classes.jsa.

Creating space for methods

One of the last steps of JVM initialization is the creation of a method area. This is a special location in off-heap memory where class data will be stored as the JVM loads it. While the method area is not located within the JVM heap, it is still managed by the garbage collector. Class data stored in the method area can be deleted if the class loader associated with the data is no longer used.

💡 Note: If you are using the HotSpot implementation for the JVM, the method space will be called a metaspace .

Comment from the editors of Spring AIO:

[metaspace,map] Trying to reserve at an EOR-compatible address
[metaspace,map] Mapped at 0x00001fff00000000

Loading, Linking, and Initializing Classes

After the first steps, which can be called “housekeeping”, are completed, the actual JVM startup procedure begins, which includes loading classes, linking them, and initializing them.

While the JVM specification describes these processes sequentially, in sections 5.3–5.5, in the HotSpot JVM these processes are not required to occur in this order for a given class. As noted at the bottom of the diagram, Resolution, which is part of the class binding process, can occur at any stage, both before checking and after class initialization. Some processes, such as class initialization, are technically not required to occur at all. We discuss these in more detail in the following sections.

Loading classes

The class loading process is described in section 5.3 of the JVM specification . Class loading is a three-step process in which the JVM finds a binary representation of a class or interface, extracts the class or interface from it, and loads this information into the Method Area of the JVM, which, recall, is called “metaspace” in the HotSpot JVM implementation.

One of the strengths of the JVM that has made it such a popular platform is its ability to dynamically load classes, which allows the JVM to load generated classes as needed throughout the JVM runtime. This ability is used by many popular frameworks and tools, such as Spring and Mockito. In fact, even the JVM itself can generate code on an as-needed basis when it uses lambdas, as it does in the InnerClassLambdaMetafactory class .

The JVM supports two ways of loading classes, either via the bootstrap class loader ( 5.3.1 ) or via a custom class loader ( 5.3.2 ). In the second case, it will be a class extending the java.lang.ClassLoader class . In practice, custom class loaders will often be defined as third-party libraries to support the behavior native to that library.

In this article, we’ll focus on the bootstrap loader, which is a special class loader written in native code and provided by the JVM. It’s instantiated late in the execution of a JNI_CreateJavaVM().

To better understand the class loading process, we need to look at the project HelloWorldas the JVM sees it:

public class HelloWorld extends Object {
	public static void main(String[] args){
		System.out.println(“Hello World!”);
	}
}

All classes extend in some way java.lang.Object. For the JVM to load HelloWorld, it first needs to load all the classes that it HelloWorldexplicitly or implicitly depends on. Let’s look at the method signatures in java.lang.Object:

public class Object {
    public Object() {}
    public final native Class<?> getClass()
    public native int hashCode()
    public boolean equals(Object obj)
    protected native Object clone() throws CloneNotSupportedException
    public String toString()
    public final native void notify();
    public final native void notifyAll();
    public final void wait() throws InterruptedException
    public final void wait(long timeoutMillis) throws InterruptedException
    public final void wait(long timeoutMillis, int nanos) throws InterruptedException
    protected void finalize() throws Throwable { }
}

The two important methods here are public final native Class<?> getClass()and public String toString(), since both of these methods refer to another class: java.lang.Classand java.lang.Stringrespectively.

If we look at java.lang.String, it implements several interfaces:

public final class String
implements java.io.Serializable, Comparable<String>, CharSequence,
Constable, ConstantDesc

To load a class java.lang.String, it first needs to load all the interfaces it implements, and if we look at the loading log, we can see that these classes are loaded in the same order they were defined, with java.lang.String.loaded being the last one:

[class,load] java.io.Serializable source: jrt:/java.base
[class,load] java.lang.Comparable source: jrt:/java.base
[class,load] java.lang.CharSequence source: jrt:/java.base
[class,load] java.lang.constant.Constable source: jrt:/java.base
[class,load] java.lang.constant.ConstantDesc source: jrt:/java.base
[class,load] java.lang.String source: jrt:/java.base

If we move on to java.lang.Class, we see that it implements several interfaces, and some of these interfaces are the same ones that it implements java.lang.String, namely java.io.Serializableand java.lang.constant.Constable.

public final class Class<T> 
implements java.io.Serializable,GenericDeclaration,Type,AnnotatedElement,
TypeDescriptor.OfField<Class<?>>,Constable

If we look at the JVM logs, we can see that the interfaces are loaded again in the order they were defined, and then the class is loaded java.lang.Class. The java.io.Serializableand classes java.lang.constant.Constablewill not be loaded because they were loaded earlier in the class loading process java.lang.String.

[class,load] java.lang.reflect.AnnotatedElement source: jrt:/java.base
[class,load] java.lang.reflect.GenericDeclaration source: jrt:/java.base
[class,load] java.lang.reflect.Type source: jrt:/java.base
[class,load] java.lang.invoke.TypeDescriptor source: jrt:/java.base
[class,load] java.lang.invoke.TypeDescriptor$OfField source: jrt:/java.base
[class,load] java.lang.Class source: jrt:/java.base

💡 Note: Normally, the JVM follows the lazy strategy for its processes, in this case for class loading. This means that a class is only loaded when it is actively referenced by another class, but since java.lang.Objectis the root from which all Java classes grow, the JVM will apply the eager strategy to java.lang.Classand java.lang.String. If you look at the method signatures for java.lang.Class( JavaDoc ) and java.lang.String( JavaDoc ), you will notice that many of these classes will not be loaded when running applications like HelloWorld. For example, there will be no references to Optional<String> describeConstable()here, so java.util.Optionalno loading will occur. This is a living example of HotSpot’s inherent lazy strategy.

The class loading process will continue throughout most of the JVM startup procedure, and in the case of a real application, during the beginning of the application’s life cycle, until it eventually completes. In total, the JVM will load about 450 classes for the scenario HelloWorld, which is why I used the analogy of the universe the JVM creates when it starts, because there really is a lot of work involved.

Let’s continue diving into the universe of the JVM startup procedure and look at class binding.

Linking classes

Class binding, described in section 5.4 of the JVM specification , is one of the most complex processes, as it involves three separate subprocesses:

Verification - 5.4.1
Preparation - 5.4.2
Resolution - 5.4.3

There are three more processes within class binding: Access Control, Method Overriding, and Method Selection, but we will not discuss them in this article.

Going back to the diagram, checking, preparing, and resolving do not necessarily happen in the order they will be described in this article. Resolving may happen before checking, but it may also happen much later, after the class has been initialized.

Verification

Verification (5.4.1 ) is the process by which the JVM ensures that a class or interface is structurally correct. This process may cause other classes to be loaded if necessary, although classes so loaded do not require verification or preparation.

Getting back to the CDS topic, in most normal situations, JDK classes will not go through an active verification phase. This is because one of the benefits of CDS is that it pre-verifies classes inside the archive, which reduces the amount of work the JVM has to do at startup. This in turn improves startup performance.

If you want to learn more about CDS, you can watch my Stack Walker video on the topic, read our dev.java articles on CDS , or this inside.java article , which talks about how to include your application’s classes in a CDS archive.

One of the classes that needs to be checked is HelloWorld, and we see the JVM doing this in the following logs:

[class,init             ] Start class verification for: HelloWorld
[verification           ] Verifying class HelloWorld with new format
[verification           ] Verifying method HelloWorld.<init>()V
[verification           ] table = { 
[verification           ]  }
[verification           ] bci: @0
[verification           ] flags: { flagThisUninit }
[verification           ] locals: { uninitializedThis }
[verification           ] stack: { }
[verification           ] offset = 0,  opcode = aload_0
[verification           ] bci: @1

Preparation

Preparation (5.4.2 ) is responsible for initializing static fields in a class to their default values.

To better understand how this works, let’s look at this simple example class:

class MyClass {
  static int myStaticInt = 10; //Initialized to 0
  static int myStaticInitializedInt; //Initialized to 0
  int myInstanceInt = 30; //Not initialized
  static {
    myStaticInitializedInt = 20;
  }
}

The class contains three integer fields: myStaticInt, myStaticInitializedInt, and myInstanceInt.

In this example myStaticInt, both and myStaticInitializedIntwould be initialized to number 0, which is the default value for the primitive type int.

In this case, the field myInstanceIntwill not be initialized, since it is an instance field, not a class field.

A little later we’ll talk a little about when the myStaticIntand fields myStaticInitializedIntare initialized to the values 10and 20.

Resolution

The purpose of the resolution process (5.4.3 ) is to resolve symbolic references into the class constant pool for use by JVM instructions.

To better understand this process, we will use the tool javap . This is a standard JDK command line tool designed to disassemble .classJava files. When run with the option -verbose, it will give us an idea of how the JVM interprets the classes it loads. Let’s run it javapon MyClass:

$ javap –verbose MyClass
class MyClass {
  static int myStaticInt = 10; //Initialized to 0
  static int myStaticInitializedInt; //Initialized to 0
  int myInstanceInt = 30; //Not initialized
  static {
    myStaticInitializedInt = 20;
  }
}

The result of this command is shown below (under the spoiler):

Lots of code:

💡 Note: This output has been trimmed slightly to remove metadata not relevant to the topic of this article.

There’s quite a lot of data here, so let’s break it down and go through it step by step to understand what it all means.

The snippet below is the default constructor for the class MyClassand is generated automatically. It starts by calling the default constructor for the parent class, namely java.lang.Object, and then myInstanceIntsets 30.

MyClass();
  descriptor: ()V
  flags: (0x0000)
  Code:
    stack=2, locals=1, args_size=1
       0: aload_0
       1: invokespecial #1 //Method java/lang/Object."<init>":()V
       4: aload_0
       5: bipush        30
       7: putfield      #7 //Field myInstanceInt:I
      10: return
    LineNumberTable:
      line 1: 0
      line 4: 4

💡 Note: You’ve probably noticed aload_0, invokespecial, bipush, putfield, etc. These are JVM instructions , opcode , that the JVM uses to do its work.

To the right of invokespecialand putfieldare the numbers #1and , #7respectively. These are references to the constant pool MyClass( 4.4 ). Let’s take a closer look at it:

Constant pool:
   #1 = Methodref          #2.#3          // java/lang/Object."<init>":()V
   #2 = Class              #4             // java/lang/Object
   #3 = NameAndType        #5:#6          // "<init>":()V
   #4 = Utf8               java/lang/Object
   #5 = Utf8               <init>
   #6 = Utf8               ()V
   #7 = Fieldref           #8.#9          // MyClass.myInstanceInt:I
   #8 = Class              #10            // MyClass
   #9 = NameAndType        #11:#12        // myInstanceInt:I
  #10 = Utf8               MyClass
  #11 = Utf8               myInstanceInt
  #12 = Utf8               I
  #13 = Fieldref           #8.#14         // MyClass.myStaticInt:I
  #14 = NameAndType        #15:#12        // myStaticInt:I
  #15 = Utf8               myStaticInt
  #16 = Fieldref           #8.#17         // MyClass.myStaticInitializedInt:I
  #17 = NameAndType        #18:#12        // myStaticInitializedInt:I
  #18 = Utf8               myStaticInitializedInt
  #19 = Utf8               Code
  #20 = Utf8               LineNumberTable
  #21 = Utf8               <clinit>
  #22 = Utf8               SourceFile
  #23 = Utf8               MyClass.java

The constant pool of a class MyClasscontains all of its symbolic references. In order for the JVM to execute the instruction invokespecial, it must resolve the link to the class’s default constructor of java.lang.Object. Returning to the constant pool, lines 1-6 provide the information needed to form such a link.

💡 Note: <init> - This is a special method that is javacautomatically generated for each constructor in the class.

The same pattern is repeated for the putfield instruction, which refers to line 7 in the constant pool, which, in combination with lines 8-12, provides the necessary information to resolve the bindings to set the value of the variable myInstanceInt. For more information on the constant pool, see the relevant section of the JVM specification .

The reason that the resolution process can happen either before or after class initialization is that it is performed lazily, only when the JVM attempts to execute a JVM instruction in the class. Not all loaded classes contain JVM executable instructions. For example, a class java.lang.SecurityManagermay be loaded but not used because it is obsolete and being deprecated . It is also possible that there is nothing to initialize in a class, and it is automatically marked as initialized by the JVM. Which brings us to the topic of class initialization…

Initialization of classes

We finally get to class initialization, which is covered in section 5.5 of the JVM specification . Class initialization involves assigning values ConstantValueto static fields and executing any static initialization blocks in the class, if any. This process begins when the JVM calls any new, getstatic, putstaticor invokestaticJVM instructions on the class.

The class is initialized by a special no args method, void <clinit>, which, like <init>, is automatically generated by javac. The angle brackets ( < >) were included intentionally because they are not valid characters for a method name and thus prevent Java users from writing their own customized methods <init>or <clinit>.

This is not a guarantee that the method <clinit>will always be created, since it is only needed if the class has static initialization blocks or fields. If the class has neither, it <clinit>is not generated, and the JVM immediately marks the class as initialized if is called on it new, essentially skipping the class initialization step. Thus, resolution can occur after the class is initialized.

Since MyClassthere are two static fields and a static initialization block, there is a method in it <clinit>, which brings us back to the output of the command javap:

  static {};
    descriptor: ()V
    flags: (0x0008) ACC_STATIC
    Code:
      stack=1, locals=0, args_size=0
         0: bipush        10
         2: putstatic     #13                 // Field myStaticInt:I
         5: bipush        20
         7: putstatic     #16                 // Field myStaticInitializedInt:I
        10: return
      LineNumberTable:
        line 2: 0
        line 6: 5
        line 7: 10

The structure <clinit>is similar to <init>, but without calling the parent class constructor, and instead putfielduses JVM constructs such as putstatic.

Hello World!

Sooner or later, there will come a point where the JVM will have done all the preparatory work needed to start executing user code inside public static void main(), where the message is located Hello World!:

[0.062s][debug][class,resolve] java.io.FileOutputStream
... 
Hello World!

In total, the JVM will load about 450 classes, and some of those classes will also be linked and initialized. On my M4 MacBook Pro, as you can see from the logs, the entire process took only 62 milliseconds, even with VERY verbose logging. The full log can be found on GitHub .

Project Leyden

These are very interesting times for the JVM startup process. The process has been continuously improving with each release, and starting with JDK 24, the first work from Project Leyden will be included in the JDK release on its main branch.

Project Leyden aims to reduce: startup time, time to peak performance, and memory footprint. It grew out of CDS and continues its legacy. As Project Leyden is integrated, CDS will gradually give way to AOT (ahead-of-time). Project Leyden’s capabilities will allow you to record the behavior of the JVM during a test run, store this information in a cache, and then load it from the cache on subsequent runs. If you want to learn more about Project Leyden, be sure to watch this video .

The major feature of Project Leyden will be JEP 483: AOT class loading and linking . We’ve already covered class loading and linking in this article, so the benefits of doing these procedures AOT instead of at startup should be obvious.

Conclusion

As you can see from this article, the JVM startup procedure is a very complex procedure. The ability to react to the availability of system resources, provide means for inspecting and profiling the JVM, dynamically loading classes, and much more leads to a serious complexity of the whole procedure.

What can we take away from all this, other than a deeper understanding of the JVM? At least two aspects are worth paying attention to, debugging and performance, although their applicability may be somewhat limited.

Debugging

The JVM startup procedure is fairly robust, and typically if an error occurs it is due to user error or perhaps some problem in a third-party library. Hopefully, a deeper understanding of what the JVM is trying to do and why can help you troubleshoot the most intractable or difficult-to-understand startup problems.

Performance improvements

Another potential benefit is that armed with this knowledge, you may find small opportunities to improve your application’s startup performance. Especially given that JEP 483 is being integrated into JDK 24, moving class loading and linking to AOT can improve startup performance even further.

However, I would remind you that in most cases, the “first party” code (i.e. yours) is only a small part of the code that the JVM runs. With all the libraries, frameworks, and the JDK itself, your application code is often just the tip of the iceberg.

Join the Russian-speaking community of Spring Boot developers in Telegram — Spring AIO to stay up to date with the latest news from the world of Spring Boot development and everything related to it.